Deferred Maintenance of Disk-Based Random Samples
نویسندگان
چکیده
Random sampling is a well-known technique for approximate processing of large datasets. We introduce a set of algorithms for incremental maintenance of large random samples on secondary storage. We show that the sample maintenance cost can be reduced by refreshing the sample in a deferred manner. We introduce a novel type of log file which follows the intuition that only a “sample” of the operations on the base data has to be considered to maintain a random sample in a statistically correct way. Additionally, we develop a deferred refresh algorithm which updates the sample by using fast sequential disk access only, and which does not require any main memory. We conducted an extensive set of experiments and found, that our algorithms reduce maintenance cost by several orders of magnitude.
منابع مشابه
A risk analysis of disk backup or repository maintenance
We discuss a simple model of disk backups and other maintenance processes that include change to computer data. We determine optimal strategies for scheduling such processes. A maximum entropy model of random change provides a simple and intuitive guide to the process of sector based disk change and leads to an easily computable optimum time for backup that is robust to changes in the model. We...
متن کاملRanked Subsequence Matching in Time-Series Databases
Existing work on similar sequence matching has focused on either whole matching or range subsequence matching. In this paper, we present novel methods for ranked subsequence matching under time warping, which finds top-k subsequences most similar to a query sequence from data sequences. To the best of our knowledge, this is the first and most sophisticated subsequence matching solution mentione...
متن کاملThe effects of misclassification errors on multiple deferred state attribute sampling plan
Multiple deferred state (MDS) sampling plan by attribute in which current lot and future lots information is utilised on sentencing submitted lot, is constructed under the assumption of perfect inspection. But sometimes the inspection may not be free of inspection errors. In this paper, we develop MDS-plan by attribute to the state where misclassification errors exist during the inspection. In ...
متن کاملMaintainability Policy for Deteriorating System with Inspection and Common Cause Failure (TECHNICAL NOTE)
A condition based preventive and corrective maintenance policy is proposed for a continuously operating system. The condition of the system is assumed to deteriorate with time. The model incorporates both deterioration as well as random common cause failures. The deterioration stages are modeled as discrete state processes. The system is put to random inspection to know the condition. The mean ...
متن کاملSuccessful Combination of Nucleic Acid Amplification Test Diagnostics and Targeted Deferred Neisseria gonorrhoeae Culture.
Nucleic acid amplification tests (NAATs) are recommended for the diagnosis of N. gonorrhoeae infections because of their superior sensitivity. Increasing NAAT use causes a decline in crucial antimicrobial resistance (AMR) surveillance data, which rely on culture. We analyzed the suitability of the ESwab system for NAAT diagnostics and deferred targeted N. gonorrhoeae culture to allow selective ...
متن کامل